The Hardness of Approximation of Euclidean k-Means
نویسندگان
چکیده
The Euclidean k-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of n points in Euclidean space R, and the goal is to choose k center points in R so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general k and a (1 + )-approximation which runs in time poly(n) exp(k/ ). At the other extreme, the only known computational complexity result for this problem is NP-hardness [1]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in R can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all k, d. In this paper we provide the first hardness of approximation for the Euclidean k-means problem. Concretely, we show that there exists a constant > 0 such that it is NP-hard to approximate the k-means objective to within a factor of (1 + ). We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to trianglefree graphs. To show this we transform G, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph H, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest. 1998 ACM Subject Classification F.2.2 Nonnumerical Algorithms and Problems
منابع مشابه
Approximation Algorithms for Bregman Clustering Co-clustering and Tensor Clustering
The Euclidean K-means problem is fundamental to clustering and over the years it has been intensely investigated. More recently, generalizations such as Bregman k-means [8], co-clustering [10], and tensor (multi-way) clustering [40] have also gained prominence. A well-known computational difficulty encountered by these clustering problems is the NP-Hardness of the associated optimization task, ...
متن کامل1 0 Fe b 20 09 Approximation Algorithms for Bregman Co - clustering and Tensor Clustering ∗
In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9, 17], and tensor clustering [8, 32]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximat...
متن کاملar X iv : 0 81 2 . 03 89 v 3 [ cs . D S ] 1 5 M ay 2 00 9 Approximation Algorithms for Bregman Co - clustering and Tensor Clustering
In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9, 18], and tensor clustering [8, 34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximat...
متن کاملApproximation Algorithms for Bregman Co-clustering and Tensor Clustering
In the past few years powerful generalizations to the Euclidean k-means problem have been made, such as Bregman clustering [7], co-clustering (i.e., simultaneous clustering of rows and columns of an input matrix) [9, 18], and tensor clustering [8, 34]. Like k-means, these more general problems also suffer from the NP-hardness of the associated optimization. Researchers have developed approximat...
متن کاملHardness and Non-Approximability of Bregman Clustering Problems
We prove the computational hardness of three k-clustering problems using an (almost) arbitrary Bregman divergence as dissimilarity measure: (a) The Bregman k-center problem, where the objective is to find a set of centers that minimizes the maximum dissimilarity of any input point towards its closest center, and (b) the Bregman k-diameter problem, where the objective is to minimize the maximum ...
متن کامل